| Student | Matricola | |
|---|---|---|
| Protani Andrea | 1860126 | protani.1860126@studenti.uniroma1.it |
| Tromboni Gabriele | 2088799 | tromboni.2088799@studenti.uniroma1.it |
| Boesso Simone | 1800408 | boesso.1800408@studenti.uniroma1.it |
Load the pre-processed data contained in the file \(\texttt{hw2_data.RData}\). You will get two
lists of length 12, \(\texttt{asd_sel}\) and \(\texttt{td_sel}\) (one per group). In other
words we have data from 12 \(\texttt{ASD}\) subjects and 12 \(\texttt{TD}\) subjects. Each slot of these
lists contains a data.frame of size (145 × 116): the 116 columns are
related to different \(\texttt{ROIs}\),
whereas the 145 rows are the observation times. Data coming from
different subjects can/must also be considered independent from each
other.
Remark: you’ve 12 subjects per group \(\rightarrow\) decide how to
pool together their data (whatever the choice, please
explain).
Let \(\rho\) be the Pearson correlation. Set the threshold \(t\) equal to a meaningfully high (trial-and-error, explain your choice) percentile of the correlation values observed in the two groups (hint: you can simply use \(\texttt{cor()}\) inside an \(\texttt{lapply()}\) to make \(\texttt{asd_sel}\) and \(\texttt{td_sel}\) into lists of 116 × 116 correlation matrices). Use Fisher Z-transform + Bonferroni correction to get two separate estimates, \(\hat{\mathcal{G}}^{\text{ASD}}(t)\) and \(\hat{\mathcal{G}}^{\text{TD}}(t)\), of the true association graphs based on 95% asymptotic confidence intervals for \(\{\rho^\text{ASD}_{j,k} \}_{j,k}\) and \(\{\rho^\text{TD}_{j,k} \}_{j,k}\).
If you haven’t already, take a look at basic tools to deal with graphs in \(\texttt{R}\) such as the \(\texttt{igraph}\), \(\texttt{ggraph}\) packages. Graphically represent all the estimated graphs and try to draw some conclusion: are there clear co-activation differences between the two groups? What happens if you skip the Bonferroni correction and work with unadjusted intervals? For a fixed \(\alpha\), if you vary the threshold \(t\) from small to large values, what connections are the more “resilient”; that is, the last ones to die? Are there differences in the two groups?
Repeat the analysis using this time the partial correlation coefficient and compare the results.
To pool the data we followed 3 different approaches which will mean 7 different trials. This choice was dictated by the fact that, having no knowledge of the subject, it was not clear to us from the outset what could be the best way to proceed to address this specific problem. In this way, after having also read articles on the subject to deepen on a theoretical level, we will be able to make a more “data-driven” decision.
The different approaches and trials will consist of:
Let’s start by loading the data, the necessary libraries, defining some functions that we will need later and setting the values for some global variables.
### most relevant functions ###
# function to compute cor.test on dataframe (taken from the professor's scripts)
cor.mtest <- function(mat, conf.level = 0.95){
mat <- as.matrix(mat)
n <- ncol(mat)
lowCI.mat <- uppCI.mat <- matrix(NA, n, n)
diag(lowCI.mat) <- diag(uppCI.mat) <- 1
for(i in 1:(n-1)){
for(j in (i+1):n){
tmp <- cor.test(mat[,i], mat[,j], conf.level = conf.level)
lowCI.mat[i,j] <- lowCI.mat[j,i] <- tmp$conf.int[1]
uppCI.mat[i,j] <- uppCI.mat[j,i] <- tmp$conf.int[2]
}
}
return(list(low = lowCI.mat, up = uppCI.mat))
}
# to quickly compute Bonferroni correction
bonferroni <- function(data, alpha=0.05){
m <- choose(ncol(data), 2)
bonf <- 1 - alpha/m
return (bonf)
}
# to quickly calculate how many correlations are statistically significant
significative_count <- function(conf_int, t){
out <- sum((conf_int$up < -t) | (conf_int$low > t)) / 2
return (out)
}
# to quickly compute the adjacency matrices based on statistically significant correlations
adj_matrix <- function(conf_int, t){
M <- 1*((conf_int$up < -t) | (conf_int$low > t))
}
We proceed with carrying out an exploratory analysis of the data by visualizing some plots.
We use histograms to visualize, having set a region of interest, what are the distributions of the data for the 2 different types of subjects (\(\texttt{TD}\) or \(\texttt{ASD}\)) and for the two different types of laboratories (Caltech or Trinity) from which the observations come.
From the histograms it can be seen that there is a strong difference in the scales of the subjects from Caltech and those from Trinity. The histograms also looks a little bit like a bell shape symmetrical about 0.
Let’s also explore the correlations present between the first \(\texttt{ROIs}\) of our datasets to verify
if we can already detect any significant patterns (as an example we will
use a subject from Caltech for both groups).
It’ possible to immediately see some \(\texttt{ROIs}\) that seem to be correlated.
Let’s now proceed with the implementation of the trials previously
proposed.
As a threshold we start with \(t=0\) and then we will raise it until we find an optimal level and then even more until we find the most resilient or the ‘last ones to die’.
We will work in parallel both with the confidence intervals calculated with \(\alpha=0.05\) and with the confidence intervals calculated after having performed the Bonferroni Correction.
Since we had to carry out numerous tests, we created a function that
collects the main steps to be performed to find the number of
significant correlations, both for \(\texttt{ASD}\) and \(\texttt{TD}\) but also the number of equal
and different significant correlations between the two groups.
The function in question is the following:
create_tables <- function(df1, df2, lista_q, alpha, type='pearson'){
# df1 must be the dataframe that refers to ASD subjects
# df2 must be the dataframe that refers to TD subjects
# type must be 'pearson' or 'partial'
# to choose the threshold t...
corr_asd <- cor(df1)
corr_td <- cor(df2)
# remove the diagonal
diag(corr_asd) <- diag(corr_td) <- 0
# calculate the threshold
lista_t <- rep(NA, length(lista_q))
for (i in 1:length(lista_q)) lista_t[i] <- quantile(abs(c(corr_asd, corr_td)), lista_q[i])
# confidence intervals at level 1-alpha
if (type=='pearson'){
conf_td <- cor.mtest(df2, 1-alpha)
conf_asd <- cor.mtest(df1, 1-alpha)
}
else{
conf_td <- cor.mtest_partial(df2, 1-alpha)
conf_asd <- cor.mtest_partial(df1, 1-alpha)
}
diag(conf_td$low) <- diag(conf_td$up) <- 0
diag(conf_asd$low) <- diag(conf_asd$up) <- 0
# new alpha deriving from the Bonferroni Correction with the function previously created
bonf <- bonferroni(df1, alpha)
# confidence intervals adjusting for multiplicity
if (type=='pearson'){
conf_td_bonf <- cor.mtest(df2, bonf)
conf_asd_bonf <- cor.mtest(df1, bonf)
}
else{
conf_td_bonf <- cor.mtest_partial(df2, bonf)
conf_asd_bonf <- cor.mtest_partial(df1, bonf)
}
diag(conf_td_bonf$low) <- diag(conf_td_bonf$up) <- 0
diag(conf_asd_bonf$low) <- diag(conf_asd_bonf$up) <- 0
# initialize the matrix where we will store the results
result_matrix <- matrix(NA, length(lista_t)*2, 4)
for (i in 1:length(lista_t)){
# setting the threshold
t <- lista_t[i]
# counting how many correlations are statistically
# significant with the function previously created
result_matrix[i*2-1, 1] <- significative_count(conf_td, t)
result_matrix[i*2-1, 2] <- significative_count(conf_asd, t)
result_matrix[i*2, 1] <- significative_count(conf_td_bonf, t)
result_matrix[i*2, 2] <- significative_count(conf_asd_bonf, t)
# building the adjacency matrices for ASD and TD
# with the function previously created
M_td <- adj_matrix(conf_td, t)
M_asd <- adj_matrix(conf_asd, t)
M_td_bonf <- adj_matrix(conf_td_bonf, t)
M_asd_bonf <- adj_matrix(conf_asd_bonf, t)
# adj. matr. for the different/same significant corr. between ASD and TD
# where we will have 2 it means that both are significant
M_diff <- M_same <- M_asd + M_td
M_diff[M_diff == 2] <- 0
M_same[M_same == 1] <- 0
M_same[M_same == 2] <- 1
# adj. matr. for the different/same significant corr. between ASD and TD with Bonferroni
M_diff_bonf <- M_same_bonf <- M_asd_bonf + M_td_bonf
M_diff_bonf[M_diff_bonf == 2] <- 0
M_same_bonf[M_same_bonf == 1] <- 0
M_same_bonf[M_same_bonf == 2] <- 1
result_matrix[i*2-1, 3] <- sum(M_same) / 2
result_matrix[i*2-1, 4] <- sum(M_diff) / 2
result_matrix[i*2, 3] <- sum(M_same_bonf) / 2
result_matrix[i*2, 4] <- sum(M_diff_bonf) / 2
}
# putting everything in a dataframe to organize the results
quantiles <- rep(lista_q, each = 2)
threshold <- rep(round(lista_t, 3), each = 2)
col_alpha <- rep(c('0.05', 'Bonf'), length(lista_t))
result_matrix <- cbind(quantiles, threshold, col_alpha, result_matrix)
tab <- data.frame(result_matrix)
colnames(tab) <- c('Quantile', 'Threshold', 'Alpha CI', 'TD', 'ASD', 'Common', 'Different')
return (tab)
}
Some further considerations on the chosen approaches:
About the first approach, we thought the relevant information could be extracted losing the sequential time order as we have a single observation for each subject. It’s a group-indendepent approach, in fact we are not summarized influencing by other subjects in the same group. At the end we get a 12x116 dataframe for each group: here, each compression is done on 145 values.
About the second approach, the relevant information could be extracted summarizing for each cerebral region all the observations taken in a given time series for all the subjects in the same group. So it’s more dependent on the features shown by the group. At the end we get a 145x116 dataframe for each group: here, each compression is done on 12 values.
About the third approach, we thought to go on without summarizing the info both by subjects and time series, so only concatenating observations we could extract relevant information between the two groups. At the end we get a 1740x116 dataframe for each group.
Trial 1: pool “across time” with the arithmetic mean.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 1297 | 807 | 199 | 1706 |
| 0 | 0 | Bonf | 52 | 8 | 1 | 58 |
| 0.3 | 0.167 | 0.05 | 735 | 346 | 61 | 959 |
| 0.3 | 0.167 | Bonf | 27 | 6 | 1 | 31 |
| 0.6 | 0.358 | 0.05 | 325 | 115 | 10 | 420 |
| 0.6 | 0.358 | Bonf | 11 | 1 | 0 | 12 |
| 0.7 | 0.433 | 0.05 | 224 | 68 | 7 | 278 |
| 0.7 | 0.433 | Bonf | 5 | 0 | 0 | 5 |
| 0.8 | 0.523 | 0.05 | 146 | 37 | 4 | 175 |
| 0.8 | 0.523 | Bonf | 5 | 0 | 0 | 5 |
| 0.9 | 0.646 | 0.05 | 69 | 14 | 2 | 79 |
| 0.9 | 0.646 | Bonf | 0 | 0 | 0 | 0 |
| 0.95 | 0.731 | 0.05 | 36 | 7 | 1 | 41 |
| 0.95 | 0.731 | Bonf | 0 | 0 | 0 | 0 |
| 0.99 | 0.862 | 0.05 | 5 | 0 | 0 | 5 |
| 0.99 | 0.862 | Bonf | 0 | 0 | 0 | 0 |
Trial 2: pool “across time” with the median.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 554 | 470 | 41 | 942 |
| 0 | 0 | Bonf | 2 | 1 | 0 | 3 |
| 0.3 | 0.137 | 0.05 | 262 | 220 | 10 | 462 |
| 0.3 | 0.137 | Bonf | 0 | 0 | 0 | 0 |
| 0.6 | 0.293 | 0.05 | 88 | 73 | 0 | 161 |
| 0.6 | 0.293 | Bonf | 0 | 0 | 0 | 0 |
| 0.7 | 0.358 | 0.05 | 61 | 40 | 0 | 101 |
| 0.7 | 0.358 | Bonf | 0 | 0 | 0 | 0 |
| 0.8 | 0.433 | 0.05 | 29 | 22 | 0 | 51 |
| 0.8 | 0.433 | Bonf | 0 | 0 | 0 | 0 |
| 0.9 | 0.541 | 0.05 | 8 | 4 | 0 | 12 |
| 0.9 | 0.541 | Bonf | 0 | 0 | 0 | 0 |
| 0.95 | 0.624 | 0.05 | 4 | 2 | 0 | 6 |
| 0.95 | 0.624 | Bonf | 0 | 0 | 0 | 0 |
| 0.99 | 0.753 | 0.05 | 0 | 0 | 0 | 0 |
| 0.99 | 0.753 | Bonf | 0 | 0 | 0 | 0 |
Trial 3: pool “across time” with the standard deviation.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 6305 | 4476 | 4325 | 2131 |
| 0 | 0 | Bonf | 721 | 103 | 33 | 758 |
| 0.3 | 0.644 | 0.05 | 1055 | 175 | 65 | 1100 |
| 0.3 | 0.644 | Bonf | 4 | 0 | 0 | 4 |
| 0.6 | 0.778 | 0.05 | 210 | 26 | 2 | 232 |
| 0.6 | 0.778 | Bonf | 1 | 0 | 0 | 1 |
| 0.7 | 0.813 | 0.05 | 122 | 15 | 1 | 135 |
| 0.7 | 0.813 | Bonf | 0 | 0 | 0 | 0 |
| 0.8 | 0.847 | 0.05 | 60 | 8 | 0 | 68 |
| 0.8 | 0.847 | Bonf | 0 | 0 | 0 | 0 |
| 0.9 | 0.885 | 0.05 | 22 | 3 | 0 | 25 |
| 0.9 | 0.885 | Bonf | 0 | 0 | 0 | 0 |
| 0.95 | 0.91 | 0.05 | 8 | 1 | 0 | 9 |
| 0.95 | 0.91 | Bonf | 0 | 0 | 0 | 0 |
| 0.99 | 0.946 | 0.05 | 1 | 0 | 0 | 1 |
| 0.99 | 0.946 | Bonf | 0 | 0 | 0 | 0 |
Comments:
In general this approach, i.e. the pool across time, was the one that
convinced us the most. In this specific case, however, we have too few
subjects to proceed in this way, having dataframes of 12 rows is not
enough to arrive at conclusions that have a minimum of
credibility.
For this reason we have discarded attempts based on this approach.
Let’s now proceed with the creation of the dataframes based on the second approach which we will need to perform tests 4,5,6.
# Initialize an empty list to store the rows
td_list <- list()
asd_list <- list()
# create the new list of dataframes 145x116
for (i in 1:145) {
row_td <- lapply(td_sel, function(df) df[i,])
row_asd <- lapply(asd_sel, function(df) df[i,])
td_list[[i]] <- do.call(rbind, row_td)
asd_list[[i]] <- do.call(rbind, row_asd)
}
Trial 4: pool “across subject” with the arithmetic mean.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 3238 | 3039 | 1762 | 2753 |
| 0 | 0 | Bonf | 781 | 678 | 314 | 831 |
| 0.3 | 0.086 | 0.05 | 1936 | 1733 | 859 | 1951 |
| 0.3 | 0.086 | Bonf | 414 | 344 | 171 | 416 |
| 0.6 | 0.188 | 0.05 | 906 | 809 | 369 | 977 |
| 0.6 | 0.188 | Bonf | 212 | 157 | 79 | 211 |
| 0.7 | 0.231 | 0.05 | 649 | 540 | 249 | 691 |
| 0.7 | 0.231 | Bonf | 139 | 113 | 54 | 144 |
| 0.8 | 0.289 | 0.05 | 416 | 347 | 173 | 417 |
| 0.8 | 0.289 | Bonf | 94 | 70 | 35 | 94 |
| 0.9 | 0.367 | 0.05 | 245 | 183 | 86 | 256 |
| 0.9 | 0.367 | Bonf | 46 | 42 | 19 | 50 |
| 0.95 | 0.446 | 0.05 | 120 | 89 | 46 | 117 |
| 0.95 | 0.446 | Bonf | 25 | 17 | 7 | 28 |
| 0.99 | 0.602 | 0.05 | 24 | 14 | 5 | 28 |
| 0.99 | 0.602 | Bonf | 5 | 3 | 1 | 6 |
Trial 5: pool “across subject” with the median.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 2255 | 2153 | 975 | 2458 |
| 0 | 0 | Bonf | 298 | 236 | 85 | 364 |
| 0.3 | 0.063 | 0.05 | 1321 | 1216 | 452 | 1633 |
| 0.3 | 0.063 | Bonf | 158 | 112 | 47 | 176 |
| 0.6 | 0.14 | 0.05 | 610 | 513 | 175 | 773 |
| 0.6 | 0.14 | Bonf | 65 | 55 | 20 | 80 |
| 0.7 | 0.174 | 0.05 | 430 | 361 | 130 | 531 |
| 0.7 | 0.174 | Bonf | 44 | 35 | 13 | 53 |
| 0.8 | 0.217 | 0.05 | 263 | 205 | 79 | 310 |
| 0.8 | 0.217 | Bonf | 30 | 16 | 9 | 28 |
| 0.9 | 0.281 | 0.05 | 136 | 92 | 40 | 148 |
| 0.9 | 0.281 | Bonf | 12 | 6 | 4 | 10 |
| 0.95 | 0.343 | 0.05 | 59 | 52 | 18 | 75 |
| 0.95 | 0.343 | Bonf | 5 | 3 | 1 | 6 |
| 0.99 | 0.471 | 0.05 | 11 | 6 | 4 | 9 |
| 0.99 | 0.471 | Bonf | 1 | 0 | 0 | 1 |
Trial 6: pool “across subject” with the standard deviation.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 1778 | 2002 | 701 | 2378 |
| 0 | 0 | Bonf | 167 | 181 | 65 | 218 |
| 0.3 | 0.057 | 0.05 | 928 | 1063 | 291 | 1409 |
| 0.3 | 0.057 | Bonf | 91 | 108 | 35 | 129 |
| 0.6 | 0.126 | 0.05 | 409 | 500 | 133 | 643 |
| 0.6 | 0.126 | Bonf | 54 | 52 | 15 | 76 |
| 0.7 | 0.156 | 0.05 | 293 | 351 | 95 | 454 |
| 0.7 | 0.156 | Bonf | 44 | 36 | 11 | 58 |
| 0.8 | 0.193 | 0.05 | 199 | 209 | 73 | 262 |
| 0.8 | 0.193 | Bonf | 33 | 27 | 10 | 40 |
| 0.9 | 0.25 | 0.05 | 102 | 114 | 39 | 138 |
| 0.9 | 0.25 | Bonf | 16 | 17 | 6 | 21 |
| 0.95 | 0.306 | 0.05 | 63 | 64 | 19 | 89 |
| 0.95 | 0.306 | Bonf | 11 | 11 | 4 | 14 |
| 0.99 | 0.443 | 0.05 | 15 | 15 | 6 | 18 |
| 0.99 | 0.443 | Bonf | 1 | 3 | 0 | 4 |
Comments:
Looking at the tables, the first thing you notice is that there are many
more significant correlations for the mean and median, while much less
are found for the standard deviation compared to the first approach.
This makes sense as we are going from a 12-row dataset to a 145-row
dataset. The results are undoubtedly more accurate but still not perfect
as there could be problems caused by putting together subjects whose
data was collected from different laboratories that, like we have seen
previously with the histograms, have different scales. Despite this, we
decided to proceed without standardizing by laboratory as in our
specific case we did not consider it necessary as our goal is to verify
that there are significant differences between the \(\texttt{ASD}\) and \(\texttt{TD}\) groups and we thought that
therefore the study of correlations would not be significantly affected
by the above problem.
Trial 7: concatenate all the observation together.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 5192 | 5101 | 4173 | 1947 |
| 0 | 0 | Bonf | 3497 | 3229 | 2312 | 2102 |
| 0.3 | 0.061 | 0.05 | 3482 | 3197 | 2289 | 2101 |
| 0.3 | 0.061 | Bonf | 2184 | 1879 | 1311 | 1441 |
| 0.6 | 0.134 | 0.05 | 1959 | 1669 | 1155 | 1318 |
| 0.6 | 0.134 | Bonf | 1136 | 934 | 650 | 770 |
| 0.7 | 0.168 | 0.05 | 1420 | 1206 | 841 | 944 |
| 0.7 | 0.168 | Bonf | 846 | 705 | 519 | 513 |
| 0.8 | 0.211 | 0.05 | 967 | 799 | 568 | 630 |
| 0.8 | 0.211 | Bonf | 576 | 499 | 370 | 335 |
| 0.9 | 0.284 | 0.05 | 503 | 433 | 326 | 284 |
| 0.9 | 0.284 | Bonf | 321 | 289 | 227 | 156 |
| 0.95 | 0.365 | 0.05 | 270 | 250 | 196 | 128 |
| 0.95 | 0.365 | Bonf | 193 | 169 | 133 | 96 |
| 0.99 | 0.57 | 0.05 | 52 | 41 | 29 | 35 |
| 0.99 | 0.57 | Bonf | 32 | 28 | 20 | 20 |
Comments:
This method is the one that allows us to find the largest number of
significant correlations but this is derived from too strong assumptions
made for the data pool. In fact, independence across time is already
forced, so considering everything independent turns out to be really too
much and leads to results affected by various problems. We get results
too accurate because having such a large number of information leads us
to obtain short confidence intervals and small standard errors. But the
problem is that the data are not really independent of each other.
Going into the specifics of why we tried several statistics:
- Arithmetic mean: since it is less resistant to
outliers we initially thought it might be a desirable feature to account
for the variability of observations. Carrying out the analysis and
seeing the results, however, we changed our minds since it seemed to us
that it gave too many significant correlations.
- Median: discourse opposite to the mean, statistic
chosen to resist outliers, makes us obtain less (in number) significant
correlations but we thought it was the right thing as we want to be
conservative about affirming that there are significant
correlations.
- Standard deviation: different statistic from the
previous ones, looking at finance and the studies that are carried out
on volatility, we thought it might make sense to study the correlations
of this statistic in our case too. We weren’t too convinced by the
results and the consideration we came to is that it could be useful but
mostly in combination with the median or in any case with another
statistic.
Following all these reasoning we decide to proceed with the next
points of the exercise based on trial 5, i.e. the pool of data carried
out across subject using the median.
Moreover, the right compromise between a fairly high quantile and the
number of acceptable significant correlations seems to us to be at level
0.9, so that is what we will use.
Therefore by setting \(t=0.281\) we construct the adjacency matrices on which the graphs will be based.
bonf <- bonferroni(td2_median, alpha)
# confidence intervals at level 0.95
conf_asd <- cor.mtest(asd2_median, 1-alpha)
diag(conf_asd$low) <- diag(conf_asd$up) <- 0
conf_td <- cor.mtest(td2_median, 1-alpha)
diag(conf_td$low) <- diag(conf_td$up) <- 0
# confidence intervals adjusting for multiplicity with the Bonferroni correction
conf_asd_bonf <- cor.mtest(asd2_median, bonf)
diag(conf_asd_bonf$low) <- diag(conf_asd_bonf$up) <- 0
conf_td_bonf <- cor.mtest(td2_median, bonf)
diag(conf_td_bonf$low) <- diag(conf_td_bonf$up) <- 0
# setting the threshold to the value chosen before
t <- 0.281
# building the adjacency matrices fro the two groups
M_asd <- 1*((conf_asd$up < -t) | (conf_asd$low > t))
M_bonf_asd <- 1*((conf_asd_bonf$up < -t) | (conf_asd_bonf$low > t))
M_td <- 1*((conf_td$up < -t) | (conf_td$low > t))
M_bonf_td <- 1*((conf_td_bonf$up < -t) | (conf_td_bonf$low > t))
# and now for the differences and the equal
M_asd_td_diff <- M_asd_td_same <- M_asd + M_td
M_asd_td_diff[M_asd_td_diff == 2] <- 0
M_asd_td_same[M_asd_td_same == 1] <- 0
M_asd_td_same[M_asd_td_same == 2] <- 1
M_bonf_asd_td_diff <- M_bonf_asd_td_same <- M_bonf_asd + M_bonf_td
M_bonf_asd_td_diff[M_bonf_asd_td_diff == 2] <- 0
M_bonf_asd_td_same[M_bonf_asd_td_same == 1] <- 0
M_bonf_asd_td_same[M_bonf_asd_td_same == 2] <- 1
# creating the graphs
G_asd <- graph_from_adjacency_matrix(M_asd, mode = "undirected")
G_td <- graph_from_adjacency_matrix(M_td, mode = "undirected")
G_diff <- graph_from_adjacency_matrix(M_asd_td_diff, mode = "undirected")
G_same <- graph_from_adjacency_matrix(M_asd_td_same, mode = "undirected")
Let’s also check which correlations between the \(\texttt{ROIs}\) are the most resilient, to do so we raise the threshold level to the value which, as shown previously in the table, caused us to find fewer correlations. So let’s set \(t=0.343\) and use the confidence intervals calculated with the Bonferroni Correction.
# we set the threshold to the maximum value that allows us to find significant correlations
t <- 0.343
# building the adjacency matrices fro the two groups
M_bonf_asd2 <- 1*((conf_asd_bonf$up < -t) | (conf_asd_bonf$low > t))
M_bonf_td2 <- 1*((conf_td_bonf$up < -t) | (conf_td_bonf$low > t))
resilient_td <- which(M_bonf_td2==1, arr.ind=T)
resilient_corr_td <- unique(t(apply(resilient_td, 1, sort)), MARGIN = 1)
resilient_corr_td <- apply(resilient_corr_td, 1, function(x) paste(x, collapse = "-"))
resilient_asd <- which(M_bonf_asd2==1, arr.ind=T)
resilient_corr_asd <- unique(t(apply(resilient_asd, 1, sort)), MARGIN = 1)
resilient_corr_asd <- apply(resilient_corr_asd, 1, function(x) paste(x, collapse = "-"))
resilient_corr_asd <- c(resilient_corr_asd, rep("", 2))
resilient_matrix <- cbind(resilient_corr_td, resilient_corr_asd)
tab <- data.frame(resilient_matrix)
colnames(tab) <- c('ROIs in TD subject', 'ROIs in ASD subject')
kable(tab, "markdown", align = "c", booktabs = TRUE, longtable = TRUE,
caption = "Resilient Correlations", escape = TRUE)
| ROIs in TD subject | ROIs in ASD subject |
|---|---|
| 11-13 | 45-46 |
| 29-73 | 48-111 |
| 43-44 | 81-82 |
| 48-111 | |
| 97-111 |
These are “the last ones to die”. It can be seen that the only pair present in both \(\texttt{ASDs}\) and \(\texttt{TDs}\) is between \(\texttt{ROIs}\) 48-111 and we can also notice that \(\texttt{ROI}\) 111 is the only one to appear in two different pairs in the\(\texttt{TD}\)group.
Let’s now visualize the plots of the graphs based on the confidence interval at level \(1-\alpha\) with \(\alpha=0.05\).
And now using the ones based on the confidence after after the Bonferroni Correction.
From the graphs but also from the table relating to trial 5 which was
shown previously, first of all we can appreciate the clear differences
in whether the Bonferroni Correction is used or not, so it seems to be
necessary to fix the problem of multiplicity and not risk thinking that
there are a large number of significant correlations. Going now to
examine the differences between the two groups, the number of
significant correlations remains more or less always the same, however
it is the pairs of \(\texttt{ROIs}\)
that are significant that change. In fact, for each threshold tested the
activations common to both groups are much less than the different ones
and this is well represented by these last graphs.
We can therefore cautiously say that there appear to be
differences in the way \(\texttt{ROIs}\) react to stimuli between
subjects with autism and subjects without.
We now turn the analysys to the Partial Correlation instead of the Pearson correlation. For the calculation we decided to use the approach based on the calculation of linear regression with subsequent verification of the correlation of the residuals. In particular what we will do is: given a pair of \(\texttt{ROIs}\), we will calculate the residuals of the linear regression of the first \(\texttt{ROI}\) with features given by the remaining N-2 \(\texttt{ROIs}\), and then we will do the same for the second, at this point we will calculate the Pearson Correlation between the residuals deriving from the two models and we will repeat this procedure for each possible pair of the 116 \(\texttt{ROIs}\).
To obtain this result we modify two previously used functions (\(\texttt{cor.test()}\) and \(\texttt{cor.mtest()}\)) in order calculates the Partial Correlation instead of the Pearson Correlation.
cor.test_partial <- function(x, y, n, D, conf.level){
r <- cor(x, y)
z <- atanh(r)
# we have to change sigma from 1/sqrt(n-3) to 1/sqrt(n-D-1)
sigma <- 1 / sqrt(n - D - 1)
cint <- z + c(-1, 1) * sigma * qnorm((1 + conf.level) / 2)
cint <- tanh(cint)
return (cint)
}
cor.mtest_partial <- function(mat, conf.level = 0.95){
mat <- as.matrix(mat)
D <- ncol(mat)
n <- nrow(mat)
lowCI.mat <- uppCI.mat <- matrix(NA, D, D)
diag(lowCI.mat) <- diag(uppCI.mat) <- 1
for(i in 1:(D-1)){
for(j in (i+1):D){
x <- mat[, i]
y <- mat[, j]
z <- mat[, -c(i, j)]
res_x <- lm(x ~ z)$residuals
res_y <- lm(y ~ z)$residuals
tmp <- cor.test_partial(res_x, res_y, n, D, conf.level)
lowCI.mat[i,j] <- lowCI.mat[j,i] <- tmp[1]
uppCI.mat[i,j] <- uppCI.mat[j,i] <- tmp[2]
}
}
return(list(low = lowCI.mat, up = uppCI.mat))
}
As done previously we calculate the number of significant correlations for each threshold. We decided to reuse the previous thresholds and not to calculate new ones based on the quantiles of the partial correlations. We thought that this approach (based on the \(t\) and not on the quantile level) is more useful to make a better comparison.
| Quantile | Threshold | Alpha CI | TD | ASD | Common | Different |
|---|---|---|---|---|---|---|
| 0 | 0 | 0.05 | 357 | 479 | 21 | 794 |
| 0 | 0 | Bonf | 0 | 0 | 0 | 0 |
| 0.3 | 0.063 | 0.05 | 157 | 234 | 2 | 387 |
| 0.3 | 0.063 | Bonf | 0 | 0 | 0 | 0 |
| 0.6 | 0.14 | 0.05 | 49 | 90 | 0 | 139 |
| 0.6 | 0.14 | Bonf | 0 | 0 | 0 | 0 |
| 0.7 | 0.174 | 0.05 | 25 | 50 | 0 | 75 |
| 0.7 | 0.174 | Bonf | 0 | 0 | 0 | 0 |
| 0.8 | 0.217 | 0.05 | 11 | 26 | 0 | 37 |
| 0.8 | 0.217 | Bonf | 0 | 0 | 0 | 0 |
| 0.9 | 0.281 | 0.05 | 3 | 9 | 0 | 12 |
| 0.9 | 0.281 | Bonf | 0 | 0 | 0 | 0 |
| 0.95 | 0.343 | 0.05 | 2 | 4 | 0 | 6 |
| 0.95 | 0.343 | Bonf | 0 | 0 | 0 | 0 |
| 0.99 | 0.471 | 0.05 | 0 | 0 | 0 | 0 |
| 0.99 | 0.471 | Bonf | 0 | 0 | 0 | 0 |
From the table we immediately notice a big difference between the two
types of correlation. With the Partial Correlation, much less
significant correlations are obtained, in particular by performing the
Bonferroni Correction no significant correlations are found at any
threshold!
The last significant correlations are found at the quantile level 0.95
which corresponds to \(t=0.343\). It is
interesting to note that the significant correlations in the two groups
are practically always different. This would seem to suggest again that
there is a difference between the two groups, but the table shows that
this difference is not so significant, since even with \(t=0\) if we correct for multiplicity we do
not find significant correlations between the \(\texttt{ROIs}\).
In the same way used previously we are going to calculate, with \(t=0.281\), the correlation matrices with which we will create the graphs. Of course we will do it only for the confidence intervals calculated with \(\alpha=0.05\) since we know that with Bonferroni we do not find significant correlations.
Let’s also check if the most resilient correlations have changed now that we are using the Partial Correlation.
| ROIs in TD subject | ROIs in ASD subject |
|---|---|
| 15-79 | 2-58 |
| 85-93 | 22-89 |
| 29-73 | |
| 50-54 |
The pairs are all different, however there is a particular case, the pair of \(\texttt{ROIs}\) 29-73 with the Pearson correlation was significant in the \(\texttt{TD}\) group while with the Partial Correlation it turns out to be significant for the \(\texttt{ASD}\) group. Furthermore, it can be noted that the number of significant correlations is reversed, in the sense that with Pearson we found more significant correlations for \(\texttt{TD}\) (5 for \(\texttt{TD}\) and 3 for \(\texttt{ASD}\)) while with the Partial Correlation we find more for \(\texttt{ASD}\) (4 for \(\texttt{ASD}\) and 2 for \(\texttt{TD}\))
Let’s now go on to visualize the graphs deriving from the analyzes carried. Naturally, as previously said, we will only use the confidence intervals constructed with \(\alpha=0.05\), otherwise we will have graphs without edges.
From the comparison with the previous graphs relating to the Pearson
Correlation, the differences are immediately noticeable but let’s create
one last plot to visualize even better the differences between the two
attempted correlation methods.
To summarize the information of the significant correlations of the
different groups we will use different colors which will resume those
used in the previous graphs.
Conclusion
From the comparison between the two methods we can say that both seem to suggest a difference between the two groups, as the significant activations are almost all different between the two groups, the problem however is that with the Partial Correlation we cannot carry out the correction for the multiplicity with Bonferroni as following the analysis there are no significant correlations between the various \(\texttt{ROIs}\). In part this is foreseeable from the theory, as by removing the effect of the other variables and therefore reducing the noise, we are going to verify the “true” and “intrinsic” correlation between the \(\texttt{ROIs}\) pairs, however we did not expect such a large difference and this is perhaps due to datasets that are not perfect to perform such an analysis as the number of rows \(n=145\) is slightly larger than the number of columns \(D=116\).